14.4 The Cell Cycle
187
Fig. 14.3 Simplified schematic diagram of eukaryotic gene structure. A is the antiparallel double
helix. Rectangles represent genes and dashed lines represent intergenomic sequences. B is an expan-
sion of a (a single gene) in A. The shaded rectangles correspond to DNA segments transcribed into
RNA, spliced, and translated continuously into proteins. p is a promoter sequence. In reality, this
is usually more complex than a single nucleotide segment; it may comprise a sequence to which an
activator protein can bind (the promoter site proper) but, also, more distant (“upstream” from the
gene itself), one or more enhancer sites to which additional transcription factors (TF) may bind. All
of these segments together are called the transcription factor binding site (TFBS). There may be
some DNA of indeterminate purpose between p and the transcription start site (TSS) marked with
an arrow. Either several individual proteins bind to the various receptor sites, and are only effective
all together, or the proteins preassociate and bind en bloc to the TFBS. In both cases, one anticipates
that the conformational flexibility of the DNA is of great importance in determining the affinity of
the binding. To the right of the TSS: shaded regions, exons; unshaded regions, introns
genome, which does not exceed 2 times 10 Superscript 82 × 108 base pairs). Another kind of repetition
occurs as the duplication in the sense of further duplication of whole genes along the
chromosome (or on another chromosome). The apparently superfluous copies tend
to acquire mutations, vitiating their ability to be translated into a functional protein,
whereupon they are called pseudogenes. 24 Gene duplication may be considered as
a form of cellular computing. 25 In the human being, satellite sequences of repeti-
tive DNA alone constitute about 5% of the genome; in the horse, they constitute
about 45%. Telomere sequences are further examples of repetitive DNA (in humans,
TTAGGG is repeated for 3–20 kilobases). Between the telomere and the remainder
of the chromosome there are 100–300 kilobases of telomere-associated repeats.
14.4.3
The C-Value Paradox
Well before genome sequence information became available, it was clear that the
amount of DNA in an organism’s cells (the C-value; more precisely it is the mass
of DNA within a haploid nucleus) did not correlate particularly well with the organ-
ism’s complexity, and this became known as the “C-value paradox”. Examination of
24 For a concrete example, see Hittinger and Carroll (2007).
25 Shapiro (2005).